stdin And stdout

Data can be piped into Python using sys.stdin and sys.stdout:



In [1]:

    
# egrep.py
import sys, re

# sys.argv is the list of command-line arguments
# sys.argv[0] is the name of the program itself
# sys.argv[1] will be the regex specified at the command line
regex = sys.argv[1]

# for every line passed into the script
for line in sys.stdin:
    # if it matches the regex, write it to stdout
    if re.search(regex, line):
        sys.stdout.write(line)



In [9]:

    
# line_count.py
import sys

count = 0
for line in sys.stdin:
    count += 1

# print goes to sys.stdout
print(count)



In [7]:

    
!type SomeFile.txt | python egrep.py "[0-9]" | python line_count.py









    



python: can't open file 'egrep.py': [Errno 2] No such file or directory
python: can't open file 'line_count.py': [Errno 2] No such file or directory

Reading Files

The Basics Of Text Files

Use the open function to open a file:



In [10]:

    
# 'r' means read-only
file_for_reading = open('reading_file.txt', 'r')

# 'w' is write—will destroy the file if it already exists!
file_for_writing = open('writing_file.txt', 'w')

# 'a' is append—for adding to the end of the file
file_for_appending = open('appending_file.txt', 'a')

# don't forget to close your files when you're done
file_for_writing.close()









    



---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-10-de2791de0a52> in <module>()
      1 # 'r' means read-only
----> 2 file_for_reading = open('reading_file.txt', 'r')
      3 
      4 # 'w' is write—will destroy the file if it already exists!
      5 file_for_writing = open('writing_file.txt', 'w')

FileNotFoundError: [Errno 2] No such file or directory: 'reading_file.txt'

Use a with block to ensure that files are closed:



In [15]:

    
with open('SomeFile.txt', 'r') as f:
    for line in f:
        print(line.strip())

# After with block, file is closed









    



line 1
line 2
line 3

Delimited Files

It's easy to work with delimited files.



In [64]:

    
import csv

with open('stocks.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        date = row[0]
        symbol = row[1]
        closing_price = float(row[2])
        print(date, symbol, closing_price)









    



6/20/2014 AAPL 90.91
6/20/2014 MSFT 41.68
6/20/2014 FB 64.5



In [65]:

    
with open('stocks-headers.csv', 'r') as f:
    reader = csv.DictReader(f, delimiter=':')
    for row in reader:
        date = row['date']
        symbol = row['symbol']
        closing_price = float(row['closing_price'])
        print(date, symbol, closing_price)









    



6/20/2014 AAPL 90.91
6/20/2014 MSFT 41.68
6/20/2014 FB 64.5

HTML And The Parsing Thereof



In [45]:

    
some_html = """
<html>
<head>
<title>A web page</title>
</head>
<body>
<p id="author">Joel Grus</p>
<p id="subject" class="important">Data Science</p>
</body>
</html>
"""



In [46]:

    
from bs4 import BeautifulSoup
import requests

html = requests.get('http://www.example.com').text
html = some_html
soup = BeautifulSoup(html, 'html5lib')



In [47]:

    
first_paragraph = soup.find('p')
first_paragraph









    Out[47]:





<p id="author">Joel Grus</p>



In [48]:

    
soup.p.text, soup.p.text.split()









    Out[48]:





('Joel Grus', ['Joel', 'Grus'])



In [49]:

    
soup.p['id']









    Out[49]:





'author'



In [50]:

    
soup.p.get('id')









    Out[50]:





'author'



In [51]:

    
soup.find_all('p')









    Out[51]:





[<p id="author">Joel Grus</p>,
 <p class="important" id="subject">Data Science</p>]



In [52]:

    
[p for p in soup('p') if p.get('id')]









    Out[52]:





[<p id="author">Joel Grus</p>,
 <p class="important" id="subject">Data Science</p>]



In [53]:

    
soup('p', {'class' : 'important'})









    Out[53]:





[<p class="important" id="subject">Data Science</p>]



In [54]:

    
soup('p', 'important')









    Out[54]:





[<p class="important" id="subject">Data Science</p>]



In [56]:

    
[p for p in soup('p') if 'important' in p.get('class', [])]









    Out[56]:





[<p class="important" id="subject">Data Science</p>]

Using APIs

JSON (And XML)

APIs transfer data in a certain format--usually JSON sometimes XML.



In [59]:

    
import json

json_string = """{ "title" : "Data Science Book",
                   "author" : "Joel Grus",
                   "publicationYear" : 2014,
                   "topics" : [ "data", "science", "data science"] }"""

# parse the JSON into a Python Dictionary
dict = json.loads(json_string)
if 'data science' in dict['topics']:
    print(dict)









    



{'title': 'Data Science Book', 'author': 'Joel Grus', 'publicationYear': 2014, 'topics': ['data', 'science', 'data science']}

Using An Unauthenticated API



In [ ]:

    
endpoint = 'https://api.github.com/users/joelgrus/repos'
repos = json.loads(requests.get(endpoint).text)



In [ ]: